9 research outputs found
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation
In this work, we study first-order algorithms for solving Bilevel
Optimization (BO) where the objective functions are smooth but possibly
nonconvex in both levels and the variables are restricted to closed convex
sets. As a first step, we study the landscape of BO through the lens of penalty
methods, in which the upper- and lower-level objectives are combined in a
weighted sum with penalty parameter . In particular, we establish a
strong connection between the penalty function and the hyper-objective by
explicitly characterizing the conditions under which the values and derivatives
of the two must be -close. A by-product of our analysis is the
explicit formula for the gradient of hyper-objective when the lower-level
problem has multiple solutions under minimal conditions, which could be of
independent interest. Next, viewing the penalty formulation as
-approximation of the original BO, we propose first-order algorithms
that find an -stationary solution by optimizing the penalty
formulation with . When the perturbed lower-level problem
uniformly satisfies the small-error proximal error-bound (EB) condition, we
propose a first-order algorithm that converges to an -stationary
point of the penalty function, using in total and
accesses to first-order (stochastic) gradient oracles when
the oracle is deterministic and oracles are noisy, respectively. Under an
additional assumption on stochastic oracles, we show that the algorithm can be
implemented in a fully {\it single-loop} manner, i.e., with samples per
iteration, and achieves the improved oracle-complexity of
and , respectively
Tractable Optimality in Episodic Latent MABs
We consider a multi-armed bandit problem with latent contexts, where an
agent interacts with the environment for an episode of time steps.
Depending on the length of the episode, the learner may not be able to estimate
accurately the latent context. The resulting partial observation of the
environment makes the learning task significantly more challenging. Without any
additional structural assumptions, existing techniques to tackle partially
observed settings imply the decision maker can learn a near-optimal policy with
episodes, but do not promise more. In this work, we show that learning
with {\em polynomial} samples in is possible. We achieve this by using
techniques from experiment design. Then, through a method-of-moments approach,
we design a procedure that provably learns a near-optimal policy with
interactions. In
practice, we show that we can formulate the moment-matching via maximum
likelihood estimation. In our experiments, this significantly outperforms the
worst-case guarantees, as well as existing practical methods.Comment: NeurIPS 202
Recommended from our members
Statistical learning with latent variables : mixture models and reinforcement learning
Statistical learning with missing or hidden information is ubiquitous in many practical problems. For example, the success of a certain medical treatment can largely depend on the unknown conditions of patients, or some parts of data could be censored to protect the privacy of individuals. In contrast to problems with full information which often have simple and tractable solutions, the existence of such latent variables often plants the intractable non-convexity complicating the landscape of the problem from both statistical and computational aspects. While both aspects are important, this thesis put more emphasis on understanding the statistical challenges raised by latent variables. This thesis consists of two main parts. In Part I, we consider the parameter estimation in finite mixture models, and in particular, we study the Expectation-Maximization (EM) algorithm for learning the maximum likelihood estimator (MLE) given i.i.d. samples from mixtures of Gaussian distributions. In Part II, we turn our focus to reinforcement learning (RL), and study the sample-complexity of learning a near-optimal policy when an important context of an environment is not observable. We now give a brief overview of these two parts. Part I. The first part of the thesis studies the convergence and statistical behaviors of EM in largely two fundamental settings: one setting concerns a mixture consisting of two symmetric components, and the other setting concerns a mixture of an arbitrary number of well-separated components. Chapter 1 describes some background and notation that would be relevant to the two settings. In Chapter 2,we focus on a mixture of two linear regressions (2-MLR), and completely characterize the global optimality of EM: we show that starting from any randomly initialized point, the EM algorithm converges to the true parameter at the known minimax statistical rates in all parameters under all signal-to-ratio (SNR) regimes. In Chapter 3, we focus on two canonical mixture problems: a mixture of K ≥ 3 well-separated Gaussians (K-GMM) and linear regressions (K-MLR). For these problems, we provide a rigorous (local) convergence guarantee for the EM algorithm when the mixture components are well-separated. Notably, we establish the minimax statistical rate of EM (and thus MLE) in all problem parameters for these two examples. Part II. In the second part of the thesis, we consider learning a near-optimal policy in Latent Markov Decision Processes (LMDPs). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. As a starting point, we show that a general instance of LMDPs with S states and A actions requires at least Ω(SA)[superscript M] episodes to even approximate the optimal policy. This lower bound suggests that the problem is only tractable when additional assumptions are provided with growing number of contexts M = ω(1), or when the number of contexts are small M = O(1). We describe a more detailed overview and backgrounds in Chapter 4. In Chapter 5, we first prove the lower bound of Ω(SA)[superscript M] in the absence of further assumptions. Then we focus on the M = ω(1) regime, where we consider sufficient assumptions under which learning good policies requires polynomial number of episodes in M. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, i.e., providing a sublinear regret guarantee when we are given a good initialization. The need for initialization can be removed if a certain statistical sufficiency assumption is provided. In Chapter 6, we consider learning a near-optimal policy in reward-mixing MDPs (RMMDPs), which itself is a special case of LMDPs with common state transitions probabilities across contexts. Without any assumptions, no sample upper bound is known even for this setting. We first study the problem of learning a near optimal policy for two reward-mixing MDPs, i.e., M = 2. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality. Indeed, with no further assumptions, even for two switching reward-models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. Finally, in Chapter 7, we show that the moment-matching idea can be applied when the system is in a simpler latent multi-armed bandit (LMAB) setting (or equivalently when S = 1). Unlike in general LMDPs which suffers Ω(A)[superscript M] lower bound, we show that the polynomial sample complexity in A is possible in LMABs.Electrical and Computer Engineerin
Clustering and Classification [Project Title from Cover]
DTRT13-G-UTC58The Expectation-Maximization algorithm is perhaps the most broadly used algorithm for inference of latent variable problems. A theoretical understanding of its performance, however, largely remains lacking. Recent results established that EM enjoys global convergence for Gaussian Mixture Models. For Mixed Regression, however, only local convergence results have been established, and those only for the high SNR regime. We show here that EM converges for mixed linear regression with two components (it is known not to converge for three or more), and moreover that this convergence holds for random initialization